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(57) 



ABSTRACT 



A visual Web site analysis program, implemented as a 
collection of software components, provides a variety of 
features for facilitating the analysis, management and load- 
testing of Web sites. A mapping component scans a Web site 
over a network connection and builds a site map which 
graphically depicts the URLs and links of the site. Site maps 
are generated using a unique layout and display methodol- 
ogy which allows the user to visualize the overall architec- 
ture of the Web site. Various map navigation and URL 
filtering features are provided to facilitate the task of iden- 
tifying and repairing common Web site problems, such as 
links to missing URLs. Adynamic page scan feature enables 
the user to include dynamic ally -gene rated Web pages within 
the site map by capturing the output of a standard Web 
browser when a form is submitted by the user, and then 
automatically resubmitting this output during subsequent 
mappings of the site. An Action Tracker module detects user 
activity and behavioral data (link activity levels, common 
site entry and exit points, etc.) from server log files and then 
superimposes such data onto the site map. A Load Wizard 
module uses this activity data to generate testing scenarios 
for load testing the Web site. 

32 Claims, 32 Drawing Sheets 
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DOCUMENT- IDENTIFIER : US 6549944 Bl 

TITLE: Use of server access logs to generate scripts and scenarios for exercising 
and evaluating performance of web sites 

Detailed Description Text (136): 

During the load testing process, each Vuser monitors the Web site's responses to 
the client requests submitted by that Vuser, and records various performance- 
related characteristics of these responses 7 These characteristics include, for 
example, response times to individual client requests, timeout events, and error 
events. Following the load testing process, the user is presented with a set of 
graphical reports that allow the user evaluate the site's performance.. Using these 
reports, the user can, for example, compare response times of different site 
components (Web servers, CGI scripts, APIs, proxy servers, etc.) to identify 
bottlenecks and other performance problems. 

Current US Original Classification ( 1 ) : 
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(57) 



ABSTRACT 



A system is disclosed for displaying information pertaining 
to the usage of Web pages. The system comprises first and 
second Web sites. The first Web site comprises plural 
Web-component files, each having a name in a Web-site 
directory. The second Web site comprises plural statistics 
filgs^ejich containing usage information about a correspond- 
ing Web -component me or sub-directory of Web-component 
files. The system farther comprises a computing device that 
has a display screen, is operable by a user, and is in 
communication with the first and second Web sites. The 
computing device is operated under the control of Web- 
browser software effective for displaying, on the screen, 
Web components of the respective Web sites. Significantly, 
the computing device is effective for requesting and 
retrieving, from either of the Web sites, data that correspond 
to user-designated Web components, and it is effective for 
directing a data request to either of the Web sites in response 
to user-designation of a Web component from the other Web 
site. 

U Claims, 6 Drawing Sheets 
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DOCUMENT-IDENTIFIER: US 6449604 Bl 

TITLE: Method for characterizing and visualizing patterns of usage of a web site by 
network users 



Brief Summary Text (16) : 

Other software tools provide reports, in the form of HyperText documents, on the 
usage of selected (such as the most popular) pages. Information from these reports 
can be displayed via the user-side browser, and links are provided for viewing the 
selected Web pages. However, these software tools also fail to provide convenient 
access from a Web page to the statistics that pertain to it. 

Detailed Description Text (25) : 

Hostname: The user who is accessing usage data may wish to filter out his own 
accesses to the Web site, because they might otherwise skew the statistics . 
Moreover, filtering on this field may be desirable in order to focus specifically 
on internal or on external visitors. 
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ABSTRACT 



A method and apparatus for detecting, storing and retrieving 
information, including duration o f view time, concerning 
advertisements included witn Web pages seen by a particular 
user and thereafter using the stored information in control- 
ling access of that user to subsequent Web pages. 

3 Claims, 9 Drawing Sheets 
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DOCUMENT- IDENTIFIER : US 6321256 Bl 

TITLE: Method and apparatus for controlling client access to documents 
Drawing Description Text (12): 

FIG. 10 is a flow chart illustrating how the user profile database of FIG. 9 is 
used to restrict the access of a user to another web page located on the server 
based upon the number of advertisements viewed on the web page according to a 
preferred embodiment of the present invention; and 

Drawing Description Text (13) : 

FIG. 11 is a flow chart is shown illustrating the use of the user profile database, 
and the advertisement database, of FIG. 9 to dynamically customize a reguested web 
page according to prior recorded viewing preferences of a user in accordance with 
the teachings of the present invention. 

Detailed Description Text (51) : 

Reference now being made to FIG. 10, a flow chart is shown illustrating how the 
User Profile Database 902 of FIG. 9 is used to restrict the access of a user to 
another web page located on the server 188 based upon the number of advertisements 
viewed on the web page 194 according to a preferred embodiment of the present 
invention. The method begins at step 1000 upon the request by the user of a new web 
page located on the server 188 via a new URL entry, hyper-link, or the like. Once 
the server 188 has received this request, it retrieves the prior received and 
recorded information (e.g. web page 194) to determine whether the user has viewed a 
pre-determined number of the advertisements A-D 606-612 (Step 1002) . If the user 
has viewed the pre-determined number, then the requested web page is retrieved 
(Step 1004) . If, however, the user has not viewed the pre-determined number, then 
the user is notified that they must first view X number of advertisements prior to 
selecting a new page (Step 1006) . 

Detailed Description Text (52) : 

Reference now being made to FIG. 11, a flow chart is shown illustrating the use of 
the user profile database 902, and the Advertisement Database 904, to dynamically 
customize a requested web page according to prior recorded viewing preferences of a 
user in accordance with the teachings of the present invention. In the subsequent 
example, the User Profile Database 902 is used for making substitution/additions of 
advertisements which have been indicated as of interest to the user based upon the 
information contained in the User Profile Database 902. This same information could 
also be used to merely delete those advertisements which have been indicated as not 
being of particular interest to the user via the User Profile Database 902 as well. 
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(57) ABSTRACT 

A method for collectin g statistics from a network device 
configured to service requests t'rom one or more other 
devices coupled thereto includes the steps of maintaining a 
log file containing one or more entries associated with each 
request serviced by the network device; identifying a page- 
level request serviced by the network device; and generating 
statistics associated with the servicing of the page-level 
request by the network device from the log file entries 
associated with the page -level request. 

9 Claims, 3 Drawing Sheets 
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DOCUMENT- IDENTIFIER : US 6304904 Bl 

TITLE: Method and apparatus for collecting page-level performance statistics from a 
network device 



File: USPT 



Oct 16, 2001 



Brief Summary Text (6) : 

In a typical network arrangement for accessing the Internet, a plurality of client 
devices may be configured to channel requests for Internet resources, such as Web 
pages, through a network device known as a proxy, or proxy server. For example, 
proxy servers are often used to channel requests for client devices residing behind 
a so-called "firewall, 11 or for client devices which use dial-up connections to an 
Internet service provider (ISP) . For a variety of reasons, it may be desirable to 
collect statistics relating to the performance of such network devices, as well as 
other devices including content servers. Moreover, it may sometimes be desirable to 
collect such performance statistics at a page level (that is, compiled with respect 
to each requested Web page processed by the network device) . Unfortunately, there 
are no existing tools for collecting page-level statistics from such network 
devices. Instead, known monitoring tools, such as the Webstone.TM. utility 
distributed by Silicon Graphics Inc., only collect statistical information at a 
system level. While system-level statistics may be useful for some purposes, for 
many applications such statistics provide an insufficient level of detail. 
Accordingly, there is a need for a method and apparatus to collect page-level 
statistics from a network device. 

Current US Original Classification (1) : 
709/224 

CLAIMS : 

1. A method for collecting page-leve l performance stati stics from a network device 
configured to perform transcoding services in connection with responding to 
requests for web pages by client devices coupled thereto, wherein the requested web 
pages include one or more associated objects, said method comprising: 

servicing a request for a web page by a client device, including retrieving the 
requested web page and each of its associated objects, transcoding at least one of 
the retrieved web page or an associated object, and returning the web page and its 
associated objects to the client devices- 
maintaining a log file containing a plurality of entries associated with each 
request for a web page serviced by the network device, the plurality of entries 
comprising a page-level entry corresponding to the web page and one or more object- 
level entries corresponding to the objects associated with the web pages- 
identifying a page-level entry in the log file for a given web page request 
serviced by the network devices- 
identifying each object-level entry in the log file for objects associated with the 
web page; and 

generating page-level performance statistics associated with the servicing of the 



http://westbrs:9000Mn/gat^ 3/21/04 



d2) United States Patent 

Cuomo et al. 



t nil nun 111 ibi iiu im uhi in iidi iii did min mi ni in 

US006185614B1 

(10) Patent No.: US 6,185,614 Bl 
(45) Date of Patent: Feb. 6, 2001 



(54) METHOD AND SYSTEM FOR COLLECTING 
USER PROFILE INFORMATION OVER THE 
WORLD-WIDE WEB IN THE PRESENCE OF 
DYNAMIC CONTENT USING DOCUMENT 
COMPARATORS 

(75) Inventors: Gennaro A. Cuomo, Apex; Blnh Q. 

Nguyen, Cary; Sandeep K. Singhal, 

Raleigh, all of NC (US) 

(73) Assignee: International Business Machines 
Corp., Armonk, NY (US) 

( * ) Notice: Under 35 U.S.C. 154(b), the term of this 
patent shall be extended for 0 days. 

(21) Appl. No.: 09/084,452 

(22) Filed: May 26, 1998 

(51) Int. CI. 7 G06F 15/173; G06F 15/16; 

G06F 7/00 

(52) U.S. CI 709/224; 709/203; 707/104 

(58) Field of Search 709/203, 224; 



707/6, 10, 104, 501, 513, 3, 5 

(56) References Cited 

U.S. PATENT DOCUMENTS 

5,649,186 * 7/1997 Ferguson 707/10 

5,732,218 3/1998 Bland 709/204 

5,740,430 * 4/1998 Rosenberg et al 707/200 

5,745,900 * 4/1998 Burrows 707/102 

5,813,007 * 9/1998 Nielsen 707/10 

5,890,164 * 3/1999 Nielsen 707/201 

5,892,917 4/1999 Myerson 709/204 

5,893,908 * 4/1999 Cullen et al 707/5 

5,895,470 * 4/1999 Pirolii et al 707/102 

5,898,836 * 4/1999 Freivald et al 709/218 

5,909,677 * 6/1999 Broder et al 707/3 

5,913,208 * 6/1999 Brown et al 707/3 

5,941,944 * 8/1999 Messerly 709/203 

5,978,842 ♦ 11/1999 Noble et al 709/218 

5,983,268 • 11/1999 Freivald et ai 709/218 

5,987,480 + 11/1999 Donohue et al 707/501 

5,999,929 * 12/1999 Goodman 707/7 



6,012,087 ♦ 1/2000 Freivald et al 709/218 

FOREIGN PATENT DOCUMENTS 

9831155 7/1998 (WO). 

OTHER PUBLICATIONS 

Brin, S., et al., "Copy Detection Mechansims for Digital 
Documents," Proc. Of the 1995 ACM SIGMOD Int'l. Conf. 
on Management of Data, ACM, pp. 398-409, May, 1995.* 
Garcia-Molina, H., et al, "dSCAM: Finding Document 
Copies Across Multiple Databases," Proc. of the 4th IntM. 
Conf.on Parallel and Distributed Information Systems, 
IEEE, pp. 68-79, May 1995.* 

4 cited by examiner 

Primary Examiner — Ahmad F. Ma tar 

Assistant Examiner — Andrew Caldwell 

(74) Attorney, Agent, or Firm — A. Bruce Clay 

(57) ABSTRACT 

Disclosed is a method and system for collecting profile 
information about users accessing dynamically generated 
content from one or more servers. In a specific embodiment, 
a server dynamically generates a web page in response to a 
user request. The server customizes the web page content 
based on the requested universal resource identifier (URI) 
and one or more of: the user's identity, access permissions, 
demographic information, and previous behavior at the site. 
The web server then passes the URI, user identity, and 
dynamically generated web page to an access information 
collector. The access information collector generates docu- 
ment comparators from the current web page content and 
compares them to document comparators associated with 
previously retrieved web pages. If the current web page is 
sufficiently similar to some previously retrieved web page, 
the access information collector logs the URI, user identity, 
and a document key associated with the matching previously 
retrieved page. Otherwise, the access information collector 
generates a new key; stores the new key and the document 
comparators in a database; and logs the URI, user identity, 
and the newly generated document key. 

27 Claims, 4 Drawing Sheets 
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DOCUMENT- IDENTIFIER : US 6185614 Bl 

TITLE: Method and system for collecting user profile information over the world- 
wide web in the presence of dynamic content using document comparators 

Brief Summary Text (2) : 

This invention relates in general to computer software, and in particular to a 
method and system for collecting profile information about users accessing Web 
pages from a plurality of Web servers. More particularly, the present invention 
relates to a method and system by which user profile information can b e collected 
when the Web content is generated dynamically for each reques t at the Web server. 
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[57] ABSTRACT 

A system and method for analyzing n, Wfih ■ritfi log fih ini 
generating an expanded log file that compensates for infor- 
mation caching and gateway based Web site access. More 
particularly, the log file expansion procedure of the present 
invention works with a log file stored in memory on the 
server computer. The log file contains a sequence of log 
records, each log record representing an object request by a 
client computer. Each log record includes data identifying 
the requested object as well as some data, such as an Internet 
address, associated with the client computer or a gateway 
through which the client computer requested the object. The 
log expansion procedure analyzes the sequence of log 
records so as to detect object request patterns indicating that 
object requests not represented by the log records were 
satisfied by cached object copies, and then supplements the 
sequence of log records with inserted log records represent- 
ing object requests for the objects corresponding to the 
cached object copies. As a result, the supplemented 
sequence of log records more accurately represents object 
requests made by client computers than the initial sequence 
of log records in the log file. Usage metering and analysis 
procedures utilized the supplemented sequence of log 
records to generate analysis reports indicative of object 
request patterns by the client computers. 

34 Claims, 6 Drawing Sheets 
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Create new session if continued session criteria are not met 
and add new session to session stack. 
Otherwise, assign a session ID from the session stack to the 
current record. 



If the requested objects referrer is not In the log file, and the 
referrer is In the same Web site, Insert a log entry for that 
object request. 
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If the session associated with the current record has a 
previous log entry and that entry Is not for the current objects 
referrer, then Insert log entries for the path between the 
current object and the object associated with the previous s 
log entry, using random walk of the directed graph. Path 
selection uses node weightings In graph. 



,270 



Delete session IDs from Session stack for those sessions 
which have been Inactive for a long Brno (e.g., 20 minutes). 
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TITLE: System for log record and log expansion with inserted log records 
representing object request for specified object corresponding to cached object 
copies 

Brief Summary Text (11) : 

It is a further object of the present invention to assign object requests in the 
expanded log files to synthesized client sessions so as to represent, in a 
statistically accurate manner, the number of client sessions accessing a Web site 
and the distribution of objects accessed by those client sessions. 

Brief Summary Text (12): 

Another object of the present invention Is to generate analyses of Web site usage 
based on an expanded log file that represents in a statistically accurate manner 
the information access patterns of the clients of the Web site . 

Brief Summary Text (14): 

In summary, the present invention is a system and method for analyzing a Web site 
log file and generating an expanded log file that compensates for information 
caching and gateway based Web site access. The expanded log file represents in a 
statistically accurate manner the information access patterns of the clients of the 
Web site, although the individual synthesized client sessions represented by the 
expanded log file do not necessarily represent actual client sessions. 
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Brief Summary Text (11): 

It is a further object of the present invention to assign object requests in the 
expanded log files to synthesized client sessions so as to represent, in a 
statistically accurate manner, the number of client sessions accessing a Web site 
and the distribution of objects accessed by those client sessions. 

Brief Summary Text (12): 

Another object of the present invention Is to generate analyses of Web site usage 
based on an expanded log file that represents in a statistically accurate manner 
the information access patterns of the clients of the Web site . 

Brief Summary Text (14): 

In summary, the present invention is a system and method for analyzing a Web site 
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log file and generating an expanded log file that compensates for information 
caching and gateway based Web site access. The expanded log file represents in a 
statistically accurate manner the information access patterns of the clients of the 
Web site, although the individual synthesized client sessions represented by the 
expanded log file do not necessarily represent actual client sessions. 
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